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Abstract. It is well known that there is a sharp density threshold for a random r-SAT formula 
to be satisfiable, and a similar, smaller, threshold for it to be satisfied by the pure literal rule. 
Also, above the satisfiability threshold, where a random formula is with high probability (whp) 
unsatisfiable, the unsatisfiability is whp due to a large "minimal unsatisfiable subformula" (MUF). 

By contrast, we show that for the (rare) unsatisfiable formulae below the pure literal threshold, 
the unsatisfiability is whp due to a unique MUF with smallest possible "excess" , failing this whp 
due to a unique MUF with the next larger excess, and so forth. In the same regime, we give a precise 
asymptotic expansion for the probability that a formula is unsatisfiable, and efficient algorithms 
for satisfying a formula or proving its unsatisfiability. It remains open what happens between the 
pure literal threshold and the satisfiability threshold. We prove analogous results for the fc-core and 
fc-colorability thresholds for a random graph, or more generally a random r-uniform hypergraph. 



1. Introduction 

Let r > 3, and consider a random r-SAT formula F with n variables, where each of the 2 r (") 
possible clauses is present independently with probability p = an~^ r ~ l \ Friedgut |10| showed that 
there is a threshold c r = c r (n) for satisfiability: for every e > 0, as n — > oo, if a < (1 — e)c r then F 
is with high probability (whp, i.e., asymptotically almost surely) satisfiable, while if a > (1 + e)c r 
then F is whp unsatisfiable. For unsatisfiable formulae, it is natural (and useful) to ask why. If F 
is unsatisfiable then it has one or more minimal unsatisfiable sub formulae (MUFs); these are the 
minimal "obstacles" to satisfiability. Chvatal and Szemeredi [5j showed that, in the unsatisfiable 
regime (up to very high clause density) a random formula will not contain any small unsatisfiable 
subformula. Thus such a formula is typically unsatisfiable for a non-local reason, which also makes 
it difficult to prove unsatisfiability. 

The aim of this paper is to develop an analogous picture for the rare unsatisfiable r-SAT formulae 
below the satisfiability threshold, and to investigate its algorithmic consequences. We are unable 
to completely characterize unsatisfiable formulae below the satisfiability threshold c r , but we can 
do so below the smaller "pure literal" threshold a*. We show that such a formula F is typically 
unsatisfiable for a small reason. Specifically, ranking MUFs in terms of excess (r — 1 times the 
number of clauses, less the number of variables) only certain excesses are possible, and there are 
only finitely many MUFs with any given excess. Theorem [TU] asserts that, whp, F contains a 
unique MUF, and this MUF has the minimum possible excess. Furthermore, if we condition on F 
having no MUF with excess up to i, then whp F still contains a unique MUF, and this MUF has 
the minimum possible excess greater than i. Additionally, Theorem 1121 gives a precise asymptotic 
expansion for the probability of unsatisfiability: it is a power series in 1/n, each of whose coefficients 
is an explicitly computable polynomial evaluated at a. (Failure of the pure literal rule, in place of 
unsatisfiability, is characterized similarly, but in terms of minimal full formulae, MFFs.) 
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We also consider failure of the pure literal rule (in place of unsatisfiability) , obtaining a simi- 
lar characterization, but in terms of minimal full subformulae (in place of minimal unsatisfiable 
subformulae) , and a similar asymptotic expansion for the probability that the pure literal rule fails. 

For random graphs and r-uniform hypergraphs (in place of r-SAT formulae), we develop a 
completely analogous picture for /c-colorability and the existence of a nonempty fc-core (in place of 
satisfiability and failure of the pure literal rule, respectively). 

Algorithmically, our results immediately imply that for a typical unsatisfiable formula in the pure 
literal regime (a typical atypical formula), we can quickly find a witness. Additionally, we show that 
for sufficiently sparse random formulae (possibly below the pure literal threshold), in polynomial 
expected time we can decide satisfiability, output a satisfying assignment for satisfiable formulae, 
and for unsatisfiable formulae, output both an assignment satisfying as many clauses as possible, 
and a minimal unsatisfiable subformula (with corresponding results for hypergraphs). The hope 
is for algorithms efficient up to the pure literal threshold, and if possible up to the satisfiability 
threshold. (That goal was already achieved for the special case of 2-variable clauses, namely the 
class Max 2-CSP encompassing Max Cut, Max 2-SAT, the Ising model, and more. There, the 
two thresholds coincide, and [20J gave an algorithm running in expected linear time, exploiting the 
exponentially small probability of components of large excess.) 

Stepping back, our exploration of unsatisfiable formulae in the satisfiable regime is complemen- 
tary to existing explorations of the other three cases. Characterization of unsatisfiable formulae 
in the unsatisfiable regime was the main goal of [5]. Algorithms for satisfiable formulae in the 
unsatisfiable regime are often sought in the "planted" model, but recently there has been success 
in the uniform model [8]. Vast attention has been paid to algorithms for satisfiable formulae in the 
satisfiable regime, and we note just one recent result, [7]. 

A similar type of structural result — where if a likely property fails to hold, it most likely does so 
for a smallest reason, otherwise most likely for a second-smallest reason, and so forth — occurs in 
the context of random triangle-free graphs, although the proofs are completely different. A random 
triangle-free graph is whp bipartite [9], and otherwise can whp be made bipartite by deleting one 
vertex, otherwise whp by deleting two vertices, and so on |18j . It would be interesting to see other 
examples of this phenomenon. 

2. Structural results for random instances of r-SAT 

In this section, we prove our results for random instances of r-SAT. In order to prove our main 
result, we must first build up a structural picture of random formulae. Any minimum unsatisfiable 
formula must be full (all variables appear both with and without negation), and it turns out to 
be simpler to concentrate on full subformulae rather than minimum unsatisfiable subformulae. We 
divide our analysis into three ranges: 

• Subformulae of size at most K: In this range, we determine rather precisely the joint 
distribution of full subformulae. 

• Subformulae of size between K and en: We show that with probability 0(n~ s ) there are 
no full subformulae in this range. 

• Subformulae of size at least en: We show that, with exponentially small failure probability, 
there are no full subformulae in this range (provided the density is below the pure literal 
threshold) . 

Here, we can choose any value for s, and then K and e > are carefully chosen constants (K must 
be sufficiently large in terms of s, and then e must be sufficiently small in terms of K), while n is 
the number of variables. We begin in Section 12.11 by giving definitions. The analysis for the three 
ranges is given in Sections 12.21 12.31 and 12. 4( we put the pieces together in Section 12.51 

2.1. Basic definitions and random model. A conjunctive normal form (CNF, or "SAT") for- 
mula consists of a set of literals (signed variables, i.e., variables and their negations) and a set of 
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clauses over these literals, each clause comprised of distinct variables with arbitrary signs. In an 
r-SAT formula each clause contains r literals; note that for a formula on n variables there are 2 r (") 
possible r-clauses. A formula F is satisfiable if there is some assignment of True and False values 
to its variables such that each clause contains at least one True literal (a literal corresponding to a 
variable inherits its truth assignment, while the negated variable gets the negated assignment). 

We define a random formula F 6 F^ ip in analogy with a random graph G € Q n ,pi letting each 
possible r-clause be present with probability p. We are primarily interested in random formulae 
where the expected number of clauses scales linearly with the number of variables. In any case, we 
work with three parametrizations, given by p, c, and a (all potentially functions of n), related by 

/i\ cn -(r-l) 

(!) P=7^j= an { J > 

where p is the clause probability, cn is the expected number of clauses, and a is a parametrization 
that is convenient because it is in fixed proportion to p but has the same desirable scaling behavior 
as c, since a = (1 + 0(l/n))2~ r r\c. 

The order \H\ of a formula H is the number of variables (not literals); the size e(H) is the number 
of clauses. We call a formula empty if it has no clauses, i.e., e{H) = 0. We define the excess of a 
formula in analogy with an established definition for hypergraphs, itself a natural extension of the 
excess (of edges over vertices) of a graph: 

(2) ex(iJ) = (r-l)e(H) - \H\. 

Two order-n formulae H and H' are isomorphic if there is remapping of their variables and their 
signs (under the action of the obvious group with 2™n! elements). An automorphism of H is an 
isomorphism between H and itself, and we write aut-ff for the automorphism group. 

H is a (proper) subformula of F if iFs variable and clause sets are subsets of F's (and at least 
one of the containments is proper). We shall say that H' is a copy of H in F if H' is a subformula 
of F that is isomorphic to H (note that the isomorphism might involve changing signs). If F has 
any subformula H' isomorphic to H we may simply say that F contains H. 

For formulae H and F, we write Xh{F) for the number of copies of H in F. For a random formula 
F £ Fn tP , recalling ([I]) and ([2]) and using the falling factorial notation = n(n — 1) • • • (n — k + 1), 



= ^HT n (l^l) 2 ' ' [ an 
0) = ^J^^n-^ 

(4) =(HO(l/n))^(V^). 

We say that a literal of F is pure if its complement does not appear in any clause of F. The 
pure literal rule chooses a pure literal of F (if there is any), and produces a smaller formula F' by 
deleting the literal's variable from F's set of variables, and deleting all clauses containing the literal 
from F's set of clauses. Note that F is satisfiable iff F' is, and if F is satisfiable then a satisfying 
assignment for F can be recovered from a satisfying assignment to F' by setting the selected literal 
True. The pure literal rule succeeds if F is eventually reduced to an empty formula, for then it 
produces a satisfying assignment for F; otherwise it is said to fail (and no conclusion can be drawn 
about the satisfiability of the original formula). 

We call a formula H full if it is nonempty and has no pure literals (i.e., every variable and 
complemented variable of H appears in some clause); we say that H is a full formula (FF). We 
call a formula H a minimal full formula (MFF), if H is full and has no full proper subformula. It 
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is well known, and easy to see, that, regardless of how the pure literal rule chooses pure literals, it 
fails on F iff F contains a full subformula or equivalently iff F contains a MFF. 

We call a formula H a minimal unsatisfiable formula (MUF) if H is unsatisfiable and contains 
no unsatisfiable proper subformula. It is clear that F is unsatisfiable iff it contains a MUF (F may 
itself be a MUF, or may properly contain one or more MUFs), and that a MUF is necessarily a 
FF. For a formula F, a contained MUF can be thought of as an obstruction to F's satisfiability, 
and a contained MFF as an obstruction to satisfying F using the pure literal rule. We will be 
interested in the probability that a random formula contains MUFs and MFFs of various sizes, and 
in particular whether typical obstructions are large or small. 

2.2. Small subformulae. We begin by considering subformulae of constant size, and give fairly 
precise results for their distribution. These results hold for random formulae of any bounded density 
c = c(n) = 0(1) (equivalently a = a(n) = 0(1)). 

Lemma 1. Suppose that r > 3. If H is full then ex(H) > 0. Furthermore, for every s > 0, there 
are (up to isomorphism) only finitely many full formulae H with ex(H) = s. 

Proof. If H is a full formula of order t, then by definition each of the t variables of H must occur 
at least twice (once with each sign) in the clauses of H. So e(H) > 2\H\/r, which implies 

ex(ff) > 2(r - l)\H\/r - \H\ = (r - 2)\H\/r. 

Since r > 2, this is strictly positive, the lemma's first assertion. Flipping the inequality, if ex(H) = s 
then \H\ < rs/(r — 2), which implies that there are only finitely many possibilities for H. □ 

Since every MUF is a FF, there are also finitely many MUFs of each excess. 

The following proposition shows that fullness plays a role somewhat like that of strict balance 
condition for graphs (see for example [2, Chapter IV] ) . A strictly balanced graph is one where every 
proper subgraph has strictly smaller density (ratio of edges to potential edges), and this can be used 
to show that a union of two strictly balanced graphs of equal density is a graph with strictly greater 
density. Here we have a property of a stronger type: the union of two non-nested full formulae 
(with possibly different excesses) is a formula with excess strictly greater than that of either. 

Proposition 2. Suppose that r > 2. For full formulae Hi and H 2 , with Hi %. H2, ex(Hi U H2) > 
ex(H 2 ) + l. 

Proof If V{H X ) C V{H 2 ) then \HiUH 2 \ = \H 2 \ while e(HiUH 2 ) > e(H 2 ), implying ex(#i UH 2 ) > 
ex(H 2 ). Since ex is integer- valued, this implies ex(i?i U H 2 ) > ex(H 2 ) + 1. 

Otherwise, let t = \V(H\) \ V(H 2 )\ > 0. Then Hi U H 2 contains 2t more literals than H 2 , 
and therefore contains at least 2t/r more clauses. So ex(Hi U H 2 ) > ex(H 2 ) + (r — l)2t/r — t = 
ex(H 2 ) + > ex(H 2 ). Since ex is integer- valued, this implies ex(Hi U H 2 ) > ex(H 2 ) + 1. □ 

Claim 3. Let r > 3, let p = an^^^ 1 ^ where a = a(n) = 0(1), and let F G F'^ ;p be a random 
formula. For any fixed, full formula H , 

P(3 a copy of H in F) = (1 + 0{\/n))-^^a< H) rr^ H \ 

Proof. With Xh the number of copies of H in F, the probability in question is P(3 a copy of H in F) = 
¥(Xh > 0). It follows from inclusion-exclusion that 

(5) EX H > F(X H > 0) > EX H - ^KX H (X H - 1). 

We will exploit Proposition [2] to show that MXh(Xjj — 1) is small compared with MXjj. 
We know already from Q that 

(6) EX H = (1 + (l/n))^La^n-^ H \ 
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Note that Xh(Xh — 1) is the number of ordered pairs (Hi,H 2 ) of distinct (but possibly overlap- 
ping) copies of ff in F. Let 7f be the set of isomorphism classes of all formulae H' = H\ U ff 2 with 
ffi and H2 isomorphic to H. Note that "H is a finite collection of formulae and depends on H alone, 
not F or n: to enumerate T-L it suffices to consider formulae Hi and H2 on variables 1, . . . , 2|ff |. 
Each copy in F of (Hi,Hz), corresponds in a 1-to-l fashion to a copy in F of some H' G % along 
with a covering of H' by an ordered pair {H\,H2j where H\ and ff 2 are both subformulae of H' 
and are both isomorphic to H. For H' G 7f , let b(H') denote the number of ways of writing H' as 
a union of an ordered pair (Hi, if 2) of subformulae of H' that are copies of if. Then we have 

E[X H {X H - 1)] = HH')E(X H ,(F)) 
H'eH 

= (l + 0(l/n)) J] KffOrS^a^'^—^') (by ©) 

< (1 + 0(1/.)) ( £ KH')^] a< H ^n-^ H ^ 
\H'en / 

= 0(1) a e(H)+l n -(«(10+l) 

= 0(a/n) E[X H ], 

where the inequality uses Proposition [21 the following equality uses that the set 7f is independent 
of F, and the final line similarly uses that the in E[X H ] (see (@D again) is independent of F, 

and a = O(l). 

With ([5]) and ([6|) this establishes the claim. □ 

Claim [3] already tells us something about the likelihood of small subformulae. Medium and large 
subformulae will be treated in subsequent sections, but while we are considering fixed subformulae 
we give two more lemmas that will be used for the structural results of Theorems [10] and [TTJ 

Lemma 4. Let r > 3, let p = an"f r_1 ' where a = a(n) = O(l), and let F G F'n be a random 
formula. Let Hi and if 2 be fixed full formulae. Then 

F(F contains non-nested copies of Hi and ff 2 ) = 0^n~ max{ex(i7l) ' ex(i72)}_1 V 

Proof. Let 7f be the set of all isomorphism classes of unions of a copy of ffi and a copy of if 2, where 
the two copies are not nested. By Proposition [21 any H' G % has ex(H') > max{ex(ffi), ex(ff2)} + l 
and so the assertion follows from Claim [3] by summing over 7f. (As in the previous proof, T-L is a 
finite set, and is independent of F and n.) □ 

Lemma 5. Let r > 3, let p = an - ^' -1 ) where a = a(n) = 0(1), and let F G be a random 
formula. If Hi , . . . , H s are distinct FFs then 

F(F D ffi I F 75 ff 2) . . . , F j> H s ) = (1 + 0(l/n))P(F d ffi). 

Proof. First consider the case of just two FFs. Because ffi and f/2 are distinct, they cannot be 
nested, and so we can use Lemma [H Let E{ be the event that F contains a copy of ffj. Then 

P(El I - jy = Xft"^ = r(£l) - r f\ n£2) = (' + OP/nMPW, 
^ P(-.£^) 1-F(E 2 ) 

where the last equality follows from Claim [3] and Lemma UJ 
In the general case, 

k k 

P(f| ^Ei) > 1 - J2 ¥ ( E i) = 1 " 0{l/n). 

i=2 i=2 
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Also, 

k k 

¥(E 1 n p| ^Ei) > P(Ei) - p (^i n E i) = p (^i) " 0{T{E x )/n), 

i=2 i=2 

where the last equality follows from Claim [3] and Lemma 01 Combining, 

p ^ i n ^ = p( wn fc n H" 1 f ) = (1 + °^ f ^- 

i=2 F U li=2 _, - fc iJ 

□ 

2.3. Medium subformulae. We now turn to a middle range of subformula size, namely between a 
large constant and a small linear size. Once again, our results hold at all densities with a bounded. 
The following is the sort of bound computed in [5] . 

Lemma 6. Let r > 3, let p = an~^ r ~ l "> where a = a{n), and let F € F r np be a random formula. 

For 1 < t < n/2a 1 /( r ~ 1 \ the probability that F contains any full subformula with t variables is at 
most 

(7) ((^-iVW/^t/n) 1 - 2 ^)'. 

Proof. Let the set of variables be v\, . . . ,v n . We order all 2n literals as v\ < —>v% < v 2 < _, ^2 < • • • • 
A full subformula H of F with order t must contain at least 2t/r clauses. We let s = \2t/r\ and 
define a subformula H* = H*(H) with s clauses as follows. Let L be the set of 2t literals occurring 
in clauses of H. Let x\ be the smallest literal in L, and let C\ be the lexicographically smallest 
clause of H (sorting the literals within each clause as above) that contains x\. For i = 2, . . . , s, let Xi 
be the smallest literal in L that does not appear in any Cj, j < i, and let Cj be the lexicographically 
smallest clause of H that contains X{. (xi is well defined since we are always excluding literals from 
at most s — 1 clauses, which together contain at most (s — l)r < 2t distinct literals.) We then take 
H* to be the conjunction of C±, . . . , C s . 

Over all full formulae H on a given set of t variables, the number of formulae H* = H*(H) is 
at most ( r 2 \) (there are at most L^M choices for each Cj, as it is forced to contain Xi), so the 

number of formulae of type H* that could possibly be subformulae of F is at most (") ( r 2 -iY • Let 
X be the number of full subformulae of F with order t, and let Y be the number of subformulae of 
type H* of F. Then clearly X > implies Y > (if X counts H, then Y counts H*(H)), so 

F(X > 0) < P(Y > 0) < E(Y) < (?) ( r ?y V 

< (en/t)*(2t) s ( r - 1 V/n r - 1 ) s 
= (en/*)* (a(2t/n)( r - 1 )) S 

< (en/t)* (a(2t/n) (r - 1) ) 2 * /r , 

which equals ([7]). □ 

Corollary 7. Lei r > 3, let p = an~( r ~ l > where a = a(n) = 0(1), and let F G 6e a random 
formula. For any positive integer s, there exist an integer to > and a real value eq > such that 
the probability that F contains any full subformula with between to an d £o n variables is o(n~ s ). 

Proof. Since the probability above is increasing in a, it is enough to prove the result for a constant, 
replacing a(n) by a = max{sup n a(n), 1}. We first choose Eq small enough that £0 < l/2a 1 /( r_1 ) 
(so that any t < £o n satisfies the hypothesis of Lemma [6|) and that 

4 (r-l)A ea 2A £ l-2/r ^ ^ 
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Thus ([7]) is at most e * for all < t < e$n. Summing over t, it follows that the probability that F 
contains a full subformula with between 2s log n and e$n variables is o(n~ s ). 
Now let t = l + \sr/(r - 2)] . For t < t < 2s log n, © is at most 

(^^p.lcgn/n)^)* < (^§2)'" = 0(n-.-V r(1 „ E „). +l) _ „ (£L 

So the probability that .F contains a full subformula with between to and 2s log n variables is o{n~ s ). 

□ 

2.4. Large subformulae. Finally, we show that large subformulae are unlikely. This is the most 
delicate regime, and we will need to bound a more strictly. Some bound on a is certainly necessary: 
if a lies above the satisfiability threshold then a random subinstance is whp unsatisfiable, but (as 
shown by Chvatal and Szemeredi [5]) whp any unsatisfiable subinstance has size Q(n). We will 
prove that large subformulae are unlikely for a below the pure literal threshold; what happens 
between the two thresholds is an open question. 

Molloy [T7] showed that there is a sharp threshold for the pure literal rule. Specifically, for r > 3, 
the threshold is3 

/o\ * ■ (r-l)\y 

(8) a = mm — r — T . 

v ; y>o 2 r ~ 1 (l - e~yy~ l 

For any constant a, letting p = an - ^" 1 ) and letting F € F r nv be a random formula, 



P(pure literal rule finds a solution) 



1 if a < a* 
if a > a*. 



Achlioptas and Peres showed pQ that, as r — > oo, the threshold for satisfiability (though not 
proved to be a constant rather than a function of n) is csat = (1 + o{l))2 r log 2, leading via ([!]) to 
a sat = (1 + o(l))r! log 2. By setting y = r in (jSJ) one can verify that the thresholds a* and «sat 
diverge for large r: the gap in our knowledge of the behavior between the two is a wide one. 

We need to show that large minimal unsatisfiable subinstances are unlikely; we therefore need a 
large deviation bound for values of a below the satisfiability threshold. We shall need the following 
version of the Azuma-Hoeffding inequality, given by McDiarmid [12J. 



Lemma 8. Let X\, . . . , X n be independent random variables, with taking values in a set for 
each k. Suppose that a measurable function f : Y\ -^k — > K satisfies \f(x) — f(x')\ < ct whenever 
the vectors x and x' differ only in the k-th coordinate. Let Z be the random variable f(Xi, . . . , X n ). 
Then for any t > 0, F{\Z -EZ\ > t) < 2 exp (-2t 2 / E C D • 

We prove the following lemma. 

Lemma 9. Let r > 3, let p = an"' r_1 ' where a = a(n) satisfies sup n a(n) < a*, and let F £ p 
be a random formula. For every £ > there is 5 > such that, for all sufficiently large n, 

F(F contains a full subformula of order > en) < exp(— 6n). 

Proof. Since the probability above is increasing in a, it is enough to prove the result for a constant, 
replacing a(n) by a = sup n a(n). We will show that, with the required high probability, the pure 
literal rule leaves fewer than en variables, establishing the lemma. (A full subformula is not affected 
by the pure literal rule, so if the "kernel" left is small, F contained no large subformula.) 

Consider the following instantiation of the pure literal rule: Set Fq = F, so \Fq\ = n. For i > 0, 
obtain i^+i from F{ by setting all pure literals to True, and then removing these literals and the 



^An earlier version of the paper, [16], had an erroneous formula a factor of 2 smaller. 
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clauses they satisfied. Molloy showed that (for any a < a*) there is a sequence A s — > such that, 
for any s, 

E\F S \ = (l + o(l))A s n. 

Let us pick s such that \ s < e/8. The result will follow from a concentration argument which we 
now give in detail. 

A path of length I in F is a sequence Vq, Cq, vx, C\, . . . , Cj, vi, alternating between variables and 
clauses, such that each clause Ci contains the variables that precede and follow it (either with 
or without negation). For a variable v and positive integer I, we define the ball Bi(v) to be the 
subformula of F containing all variables and clauses that lie on paths of length at most I starting 
at v . (Note that each clause in a ball is fully supported by variables in it.) 

It is part of Molloy's argument, and clear with a little thought, that the event that v belongs 
to F s depends only on B s (v). We shall say that a variable v is good if it has the following two 
properties: 

• v does not belong to V(F S ) (the set of variables of F s ), and 

• no variable in B s (v) belongs to more than K(s,a) clauses. 

Here, K{s, a) is a constant chosen sufficiently large that the second property holds with probability 
at least 1 — e/8. There exists such a K(s,a) independent of n because the scaling of (P) was 
chosen precisely to make the local structure of an instance independent of n. For a simple rigorous 
argument, the degree of any variable in B s (v) is at most |-B s+ i(?;)|, E[|5 s+ i(u)|] is obtained by 
multiplying the number of paths by their probability of being present and has an upper bound 
independent of n, and taking K(s,a) to be 8/e times this value, the desired probability follows 
from Markov's inequality. 

Since the first property occurs with probability 1 — A s + o(l), we see that for large enough n, v is 
good with probability greater than 1 — e/4. We will prove that, with failure probability exp(— 5n), 
there are at least (1 — e)n good variables. Now note that the pure literal rule can never set a variable 
belonging to a full subformula. Thus if H is a full subformula of F then V(H) C P|£o ^(-^0- I n 
particular, V{H) C V{F S ) and so no good variable can belong to a full subformula. The claimed 
result is then immediate. 

To prove our concentration bound, we first claim that changing a single clause in an instance 
cannot change the number of good variables by more than 2r s+1 K s . (This is the purpose of the 
second goodness condition.) Suppose we add a clause C to an instance / to obtain an instance 
If adding C spoils a variable v (v is good in I but not in /'), C must contain some variable 
u G B s (v). Choose a shortest path P from u to v. P has length at most s, and P C / (it is 
shortest, so it doesn't contain C), thus P C B s (v), and since v was good in /, P contains no 
variables with degree (in /) more than K. Generating all paths of this sort, there are r choices for 
the variable u € C, and from each variable at most K choices for the following clause and r choices 
for the succeeding variable, so there are at most r s+1 K s such paths, and at most that many spoiled 
variables. Therefore, adding a clause can decrease the number of good variables by at most r s+1 K s , 
and similarly deleting a clause can create at most r s+l K s good variables. The claim follows. 

Finally, to use the Azuma-Hoeffding inequality (Lemma [8]) we need to argue in terms of a fixed 
number of clauses. For this purpose we note that goodness is a monotonic property (if v is not 
good, adding clauses cannot make it good), and couple the original model to one with a fixed 
and typically larger number of clauses. Specifically, first observe that the probability of being good 
is a continuous function of a (increasing a slightly adds a small linear number of new clauses, each 
of which spoils at most r s+1 K s good variables, a small fraction of the nearly n such variables). 
We can therefore choose a' > a such that in an instance with clause probability p' = a'n^^'^ 1 ^, 
each variable is good with probability at least 1 — e/3. Let p" = (p + p')/2 and M = [p"2 r (^j\. 
Define an M -clause model J 7 ^ M where we sample M clauses uniformly with replacement from 
the set of all possible clauses, then discard duplicates (because of which this is not exactly the 
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analogue of the usual G n M model). It is easy to check that, for some 5q > 0, with probability 
1 — 0(exp(— Squ)), an instance of p has fewer clauses than one of M which in turn has fewer 
clauses than one of JT There is therefore a coupling between the three models in which, with 
probability 1 — 0(exp(— Squ)), the corresponding random formulae satisfy F p C Fm C F p r. 

We now complete the argument. By Lemma [8] (with Xi = Cj), in M , with probability at least 
1 — 0(exp(— 5in)) the number of good variables is within en/8 of its expectation. By the coupling 
with J-^pi, this expectation is at least (1 — e/2)n (we inflate the e/3 slightly to compensate for the 
exponentially small failure probability). So in F^ M , with exponentially small failure probability, 
we get at least (1 — 2e/3)n good variables. Finally, the coupling with J^ p shows that, with 
exponentially small failure probability, we get at least (1 — e)n good variables. □ 

2.5. Main results. Consider the set of all MUFs. Order the set of values for excess as exi < 
ex2 < • • • ; by Lemma Q] these values are some subset of the positive integers). For s > 0, we write 
J- s for the set of MUFs F' with ex(F') = ex s ; note that by Lemma [1] each F s is finite. 

Theorem 10. Fix i > 0. Let r > 3, let p = an"' r " 1 ' where a = a(n) = G(l) satisfies sup n a(n) < 
a*, and let F G J~np be a random formula. If we condition on the event that F is unsatisfiable and 
contains no MUF F' with ex(F') < exj then, with probability 1 — 0(l/n), the following statements 
hold: 

(i) F contains a unique MUF Fq. 

(ii) F eFi. 

(Hi) For each F' G T i} we have F(F F') ~ "^ff /Z, where Z = Z F 'e^ ilutff' • 

Proof. This will follow by combining results from previous sections. Let C be the condition that F 
contain no MUF F' with ex(F') < exj (but not that F is unsatisfiable). 

Choose tQ large enough and £o > small enough so that Corollary [7] applies with s = exj+1. 
Together with Corollary [9] (with e = Eq), we conclude that the probability that F G contains 
any full subformula on more than to vertices is o(n~ s ). This is also true after conditioning, since 
for any event E, F(E | C) = F(EAC)/F(C) < P(£)/P(C) = (1 + 0(l/n))F(E). 

There are finitely many possibilities for minimal unsatisfiable subformulae on to or fewer vertices. 
From Lemma Eland Lemma El for any F with ex(F ) > exj, F(F D F \ C) = (1 + 0(l/n))F(F D 

F ) = (1 + O (1/n) ) | *lt°p o Q e(F ° } n~ ex ( F ° ) . When F € F it i.e., ex(F ) = exj, this is a relatively likely 

event, with probability B(n~ CXi ); otherwise it is 0(l/n) less likely. 

For any two MUFs F\ and F2 with ex(i ? i), ex(i ? 2) > exj, F(F contains non-nested copies of F± and F2 
C) = (1 + 0(l/n))F(F contains non-nested copies of Fi and F2) = 0(n~ eXi+1 ) by Lemma[H 

Now condition on the event that F is unsatisfiable, i.e., that at least one of the above cases 
occurs. Then the middle case, with ex(i ? o) = exj, dominates the other cases. □ 

The same proof gives the analogous statement for minimal full subformulae. Consider the set of 
all MFFs, and order the set of values for excess as ex^ < ex 2 < • • • ; again, these values are some 
subset of the positive integers. For s > 0, we write T' s for the set of MFFs F' with ex(i ?/ ) = ex^; 
note that by Lemma [H each F' s is finite. 

Theorem 11. Fix i > 0. Let r > 3, let p = an~^ r-1 ^ where a = a(n) = G(l) satisfies sup n a(n) < 
a*, and let F £ J~n,p be a random formula. If we condition on the event that F contains a full 
subformula, but no full subformula F' with ex(F') < ex^ then, with probability 1 — 0{\/n), the 
following statements hold: 

(i) F contains a unique minimal full subformula Fq. 

(ll) Fq G T[. 

(in) For each F' G T[, we have P(F F') ~ "^fT / Z > where Z = ^FeJ' "unlt^ ■ 
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We can also write an asymptotic expansion for the probability that F is unsatisfiable or that the 
pure literal rule fails (i.e., that F has a full sub formula) . 

Theorem 12. Let r > 3, let p = ara - ^ -1 ) where a = a(n) = 0(1) satisfies sup„a(n) < a*, and 
let F € J~n tP be a random formula. For every full formula H there is a sequence of polynomials 



(H) 
Pi ,P2 



( H) 

,P2 , • • • with rational coefficients such that, for any s r 



(9) F(F contains a copy of H) = ^ (a)n~ s + 0(n~ Smax_1 ). 

s=l 

Furthermore, there is a sequence of polynomials pi,p2,... with rational coefficients such that, for 
any s max and any a < a* , 

8 max 

(10) ¥(F is unsatisfiable) = ^ p s (a)n~ s + 0(n _Sniax-1 ), 

s=l 

and similarly a sequence Pi,p 2 , ■ ■ ■ such that 



s 



max 



TOj) P(the pure literal rule fails on F) = p' s {a)n 8 + 0(n Smax : ) 



s=l 



Proof. Fix s max and a. Note that ([3]) can be written as 
(11) EX H = a e W PH (l/n), 

where pu is a polynomial of degree ex(H). The kth factorial moment of Xh is a sum of expectations 
E#v over configurations H' consisting of the union of k distinct copies of H, and so is a sum of 
expressions like (llip . 

Now for k > 1, P(Xf/ = k) and P^Xh > k) can be written as alternating sums in the factorial 
moments (see [21 Section 1.4]), and these sums satisfy the alternating inequalities. If K is fixed 
and sufficiently large then the .fTth factorial moment has value 0(n _Smax_1 ), as all its constituent 
configurations have excess larger than s max . Thus we can truncate our sum after a constant number 
of terms, with error 0(n _Smax_1 ). Each term is of form so we obtain an expression of form ([9]). 

We obtain (jlOp similarly. Let T be the set of minimal unsatisfiable subformulae whose excess is at 
most s max , and let X be the number of subformulae of F that belong to T . As in the previous case, 
asymptotic expansions for the factorial moments of X all have form Q, and once again applying 
inclusion-exclusion (and noting that we again have the alternating inequalities), truncating at the 
n -smax terms gives an asymptotic expansion of form (jlOp . Minimal unsatisfiable subformulae of 
excess greater than s max can be incorporated into the 0(n _Smax_1 ) term by Lemmas [7] and [9l The 
argument for llQtl is identical, just phrased in terms of minimal full subformulae rather than minimal 
unsatisfiable subformulae. □ 

Let us note that it is only a finite (if tedious) computation to determine the polynomials p s , and 
p s for any given H and s. 

3. Structural results for sparse random graphs and hypergraphs 

We now prove results on the fe-core and fe-colorability of a sparse random graph or hypergraph. 
The definitions, results, and proofs here precisely parallel those of Section [2l 

We write G r (n,p) for the random r-uniform hypergraph model analogous to Q(n,p): a hypergraph 
G € Q r (n,p) has vertex set [n], and each possible edge (of size r) is independently present with 
probability p. We work with the scaling 



cn 



« — — rvn 



-(r-1) 
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where p is the clause probability, cn is the expected number of clauses, and a is a convenient 
parametrization. 

For an r-uniform hypergraph H we define 

ex(ff) = (r - l)e(H) - \H\. 

We say that H is k-dense if it has minimal degree 5(H) > k. The hypergraph fc-core is defined in 
the usual way, for example via the process detailed in the proof of Lemma [20} and it is /c-dense. H 
is a minimal /c-dense hypergraph if it is nonempty and has no proper /c-dense subhypergraph. 

Pittel, Spencer and Wormald [19J determined the threshold for the appearance of a /c-core in 
a random graph G G Q(n,Cf*/ri). They further showed that, for any fixed c < and e > 0, the 
probability that G G G(n, c/n) has a /c-core of size bigger than en is at most exp(— n 5 ) (in fact, they 
did rather more). Molloy [IT] determined the /c-core threshold a** = a^* r for a random r-uniform 

hypergraph G G Q r (n, an - ^ 1 )) and proved that for any fixed a < a** and e > 0, the probability 
that Q r (n, an^^^ 1 ^) has a /c-core of size bigger than en approaches 0. 
Let us write Xh{G) for the number of copies of H in G. Then 



(13) = (1 + 0(l/n)) T7 J m a'mn-"-m. 

Lemma 13. Suppose that r, k > 2 and r + k > 4. If H is a k-dense, r-uniform hypergraph then 

ex(.H) > \H\. 

r 

Furthermore, for every s > 0, there are (up to isomorphism) only finitely many k-dense graphs H 
with ex(H) = s. 

Proof If 5(H) > k then e(H) > k\H\/r and so 

ex(fr) > fc|fl"|(r - l)/r - \H\ = ^ ~ ^ ~ ^ ~ 1 1 Hi 

r 

So if ex(H) = s then < rs/[(k — l)(r — 1) — 1], which implies that there are only finitely many 
possibilities for H. □ 

Note that the fc-core is necessarily fc-dense. It follows that there are only finitely many possible 
A:-cores of each excess. 

Proposition 14. Suppose thatr,k > 2 andr + k > 4. For k-dense, r-uniform hypergraphs H\ and 
H 2 , with Hi ^ H 2 , ex(Hi U H 2 ) > ex{H 2 ) + 1. 

Proof ItV(Hi) C V(H 2 ) then \H x ViH 2 \ = \H 2 \ while e(H 1 UH 2 ) > e(H 2 ) implying ex(HiUH 2 ) > 
ex(H 2 ) which by integrality means ex(Hi U H 2 ) > ex(H 2 ) + 1. 

Otherwise, let t = \ V (Hi)\V (H 2 )\ > 0. Then HiUH 2 contains at least kt/r more edges than H 2 
(since each vertex in V(Hi) \ V(H 2 ) is incident with at least k edges). So ex(Hi U H 2 ) > ex(H 2 ) + 
kt(r — l)/r — t > ex(H 2 ). Since ex is integer-valued, this implies ex(Hi U H 2 ) > ex(H 2 ) + 1. □ 

Claim 15. Let r, k > 2, r + k > 4, let p = an~^ r ~ 1 ^ where a = a(n) = 0(1), and let G G Q r (n,p) 
be a random hypergraph. For any fixed k-dense, r-uniform hypergraph H, 

P(3 a copy of H in G) = (1 + 0(l/n))^^a e(//) n-° x(H) . 

Proof. With Xh the number of copies of H in G, we have from (|13p that 

EX H = (l + (l/n))' a ^n-^ H \ 
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while 

E[X H (X H - 1)] = O(l) a «#0+i n -(«(JO+i) = 0(a/n) E[X H ] 

and the rest of the proof follows as for Claim [3l □ 

Lemma 16. Let r, k > 2, r + k > 4, let p = an _ ( r_1 ' where a = a(n) = 0(1), and let G G Q r (n,p) 
be a random hypergraph. Let H\ and H 2 be fixed k-dense, r-uniform hypergraphs. Then 

¥(G contains non-nested copies of Hi and H 2 ) 
— Qf n -vxax{ex(Hi),ex(H2)}-l\ 

Proof. Another proof without changes. □ 

Lemma 17. Let r, k > 2, r + k > 4, let p = an~( r_1 ) where a = a(n) = 0(1), and let G G G r (n,p) 
be a random hypergraph. If Hi, . . . , H s are distinct minimal k-dense, r-uniform hypergraphs (or 
minimal non-k- colorable r-uniform hypergraphs) then 

F(G d Hi I G 75 H 2 , . . . , G 75 H s ) = (1 + 0(l/n))P(G D H{). 

Proof. Another proof without changes. □ 

Lemma 18. Let r, k > 2, r + k > 4, let p = an _ ' r_1 ' where a = a(n), and let G G Q r (n,p) 
be a random hypergraph. For 1 < t < n/ a l ^ r ~ l \ the probability that G contains any k-dense 
subhypergraph with t variables is at most 



(14) [[ea^)(t/n) 

Proof. We modify the proof of Lemma El Order the vertices as vi < V2 < ■ ■ ■ ■ A /c-dense subhy- 
pergraph H of G with order t must contain at least kt/r edges. We let s = \kt/r~\ and define a 
subhypergraph H* of H with s edges as follows. Let L be the set of t vertices occurring in edges 
of H. Let xi be the smallest vertex in L, and let Ci be the lexicographically smallest edge of H 
(sorting the vertices within each edge as above) that contains x\. For i = 2, . . . , s, let Xj be the 
smallest vertex in L that is not covered k times by Cj, j < i, and let Ci be the lexicographically 
smallest edge of H that contains X{. (This is well defined since we are always excluding at most 
s — 1 edges, which together contain at most (s — l)r < kt vertex occurrences.) We then take H* to 
be the edge set C\, . . . , C s . 

The number of hypergraphs of type H* that could possibly be subhypergraphs of G is at most 
(t) (r-i) ■ ^ ^ e ^ ne number of /c-dense subhypergraphs of G with order t, and let Y be the 
number of subhypergraphs of type H* of G. Then X > implies Y > 0, so 

V(X > 0) < F(Y > 0) < E(Y) < ( 1 \ p s 



t J \r - 1, 

< (en/tfity^ia/n'- 1 )' 

= (en/tf (ait/nf^Y 

< {en/tf (a{t/nf r - v ^ kl ' 



which equals (fT4"|h □ 

Corollary 19. Let r,k > 2, r + k > 4, let p = an^^^ 1 ^ where a = a(n) = 0(1), and let 
G G G r (n,p) be a random hypergraph. For any positive integer s, there exist an integer to > and 
a real value £o > such that the probability that G contains a k-dense subhypergraph with between 
to and Squ vertices is o(n~ s ). 

Proof. As before. □ 
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Recall that we defined a^* r to be the fc-core threshold for Q r (n,an ( r 



k,r' 



Lemma 20. Let r, k > 2, r + k > 4, let p = an ( r ^ where a = a(n) satisfies sup n a(n) < a 
and let G G Q r (n,p) be a random hypergraph. For every e > there is 5 > such that, for all 
sufficiently large n, 

P(G contains a k-dense subhypergraph with order > en) < exp(— 6n). 

Proof. We follow the argument of Lemma[9j We use the following process for generating the k-core: 
Set Go = G, so | Co I = n - For i > 0, obtain Gi+i from Gi by deleting (in a single round) all vertices 
of degree at most k — 1, and all edges incident on any such vertex. The k-cove is Goo = G n . As 
with satisfiability, Molloy showed that (for a < a**) there is a sequence A s — > such that, for any 

s, 

E\G 3 \ = (l + o(l))A a n. 

A ball B s (v) has the usual hypergraph definition analogous to the ball definition in the proof of 
Lemma [H and each edge in a ball is fully supported by vertices in it. We shall say that a vertex v 
is good if it has the following two properties: 

• v does not belong to V(G S ), and 

• no vertex in B s (v) has degree more than K(s, a). 

The rest of the proof is as before. □ 

Consider the set of all minimal non-fc-colorable hypergraphs, order the set of values for excess as 
exi < ex2 < • • • , and let Qi be the set of non-fc-colorable hypergraphs with excess i. Similarly, let 
the minimal fc-dense hypergraphs have excesses ex' x < ex' 2 < ■ ■ ■ and let Q[ be the set of minimal 
/c-dense hypergraphs with excess i. Then we have the analogues of Theorems [101 [Til and E21 by 
the same reasoning. 

Theorem 21. Fix i > 0. Let r, k > 2, r + k > A, let p = an~^ T ^ = G(l) where a = a(n) 
satisfies sup n a(n) < a* k * r , and let G £ G r (n,p) be a random hypergraph. If we condition on the 
event that G is non-k- colorable and contains no minimal non-k- colorable G' with ex(G') < exj then, 
with probability 1 — 0(l/n) ; the following statements hold: 

(i) G contains a unique minimal non-k -colorable Gq. 
(ii) G G Qi. 

(Hi) For each G' G Qi, we have P(G = G') ~ \°^tG'\ / Z > where Z = J^G'eGi \tutG'\ • 

Theorem 22. Fix i > 0. Let r, k > 2, r + k > 4, let p = an _( - r_1 ^ = 0(1) where a = a(n) satisfies 
sup n a(n) < a^* r , and let G G Q r (n,p) be a random hypergraph. If we condition on the event that 
G contains a nonempty k-core, but no nonempty k-core G' with ex(G') < ex^ then, with probability 
1 — 0(l/n), the following statements hold: 

(i) G contains a unique minimal nonempty k-core Gq. 
(ii) Go G Q\. 

(Hi) For each G' G Q' t , we have P(Go = G 1 ) ~ \ ^tG'\ / Z > w ^ ere % = J^G'&g' \ ^utG' \ • 

Theorem 23. Let r > 3, letp = an"' 1 '" 1 ' where a = a(n) = 0(1) satisfies sup n a(n) < a*, and let 
F G T r n p be a random formula. For every k-dense hypergraph H there is a sequence of polynomials 

p\ ,P2*\ ■ ■ ■ with rational coefficients such that, for any s max , 

max 

(15) P(G contains a copy of H) = p( H \a)n~ s + 0(n _Smax_1 ). 
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Furthermore, there is a sequence of polynomials pi,p2,--- with rational coefficients such that, for 
any s max and any a < a* , 

^max 

(16) F(G is non-k- colorable) = ^p s (a)ra~ s + 0(n~' Smax ~ :L ), 

s=l 

and similarly a sequence p'i,P2, ■ ■ ■ such that 

S max 

(|16l ) P(G /ms a nonempty k-core) = ' s ^^p' s (a)n~ s + 0(n~ Smax_1 ). 

s=l 



4. Conclusion 

4.1. Examples. For graphs, i.e., hypergraphs with r = 2, any fc-dense graph on n vertices has 
n > k + 1 (each degree is at least k) and at least edges, thus has excess at least {k/2 — l)n; 

this is uniquely minimized by n = k + 1 and the graph K^+i, with excess (k — 2)(& + l)/2, k{k + \)/2 
edges, and | aut = (k + 1)!- Since -K&+1 is non-fc-colorable, it is also the unique smallest non- 

£;-colorable graph. Thus for k > 3, a < a^* 2 , and G G <5(n,a/ra), 

P(G is not fc-colorable) = (1 + 0(1/^))^^ a k ( k + 1 )/ 2 n - ( - k - 2 ^ k+1 ^ 2 , and 
P(G has a nonempty £;-core) = (1 + 0(l/n))j^ a Hk+i)/2 n -(k-2)(k+i)/2 _ 

Furthermore, if G has nonempty fc-core then with probability 1 + 0(l/n) its fc-core is a single copy 
of -Kfc+i; the same conclusion follows if G is not fe-colorable. 

For random r-SAT formulae, any full formula on t variables has excess at least (r — 2)t/r, and 
this is minimized uniquely by t = r and the formula Fl consisting of the 2 clauses (X\, . . . ,X r ) 
and (X\, . . . , X r ), with excess r — 2 and 2 • r! automorphisms. Thus, for r > 3 and F G J-^ p , 

(17) P(the pure literal rule fails to satisfy F) = (1 + 0(l/n)) 2^Ta 2 n~ (r_2) . 

Furthermore, if the pure literal rule fails to satisfy F then with probability 1 + 0(l/n) its pure 
literal core is a single copy of Fl, which is satisfiable, in contrast to the graph case, where we have 
seen that the /c-core is almost surely the non-fc-colorable graph K^i. 



4.2. 2-SAT and 2-CSP. For random formulas, we have assumed throughout that r > 3, because 
this is needed for Lemma [TJ Also, we leave unresolved what happens between the pure literal and 
satisfiability thresholds. However, much is already known about random 2-SAT, and in this case 
the thresholds are equal, both having a = 1/2. Chvatal and Reed [4] show that a 2-SAT formula 
is unsatisfiable iff it contains a "bicycle", and it is straightforward to compute the likelihoods of 
bicycles of various sizes. Our earlier paper [20] exploited the typically small size of the 2-core of a 
random graph G G Q(n, a/n) with a < 1 (a threshold above which the core jumps to linear size) to 
give an algorithm running in expected time 0(n) for "random" instances of any Max 2-CSP below 
this threshold; the class of optimization problems Max 2-CSP includes Max 2-Sat and Max Cut. 



4.3. Very sparse instances. For very sparse instances (a — > very quickly), our results need a 
little modification, as the preference order for small subinstances must be changed. For instance if 
p = n~ logn then FFs will appear primarily in order of the number of clauses and only secondarily 
in terms of number of variables (rather than in terms of excess). 
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4.4. Structural results. Our results on small subformulae hold for any constant density. How- 
ever, above some threshold large subformulae appear. Our structure theory for random unsatisfiable 
formulas applies below the pure literal threshold, because we know there are no large full subfor- 
mulas in this range. From the other side, we know by Chvatal-Szemeredi [5] (or from our analysis) 
that large unsatisfiable subformulae (therefore large full formulae) appear above the satisfiability 
threshold. (At a density a = 0(1) any constant above the satisfiability threshold, an instance is 
whp unsatisfiable |10j . but our results for subformulae of small and medium size apply for any 
a = 0(1), so full [and potentially unsatisfiable] subformulae of up to small linear size occur with 
small probability, thus the obstruction to satisfiability must whp be a large minimal unsatisfiable 
subformula.) It would be most interesting to know what happens for formulas between the pure 
literal and satisfiability thresholds. 

Specifically, are large minimal unsatisfiable subformulae unlikely between the two thresholds, 
as are large full subformulae below the pure literal threshold? Concretely, let c r (n)n _( - r_1 ^ be a 
threshold function for r-SAT; recall that c r (n) is believed but not known to converge to a constant. 

Question 1. Let r > 3, e > 0, and p = an~( r ~^\ with a(n) < (1 — e)c r (n) for all n. Let q(n) be 
the probability that a random formula F G contains a minimal unsatisfiable subformula on at 
least en vertices. Ls q(n) = n^ w ^ ? 

A positive answer would immediately translate into a proof of a structural theorem. 

4.5. Algorithms. The behavior of algorithms up to the satisfiability threshold is unclear. However, 
it is easy to give algorithms for sufficiently sparse instances. For instance: 

Theorem 24. For all r, for all sufficiently small a there is an expected polynomial-time algorithm 
to decide the satisfiability of a random formula F G F'^p, outputting an assignment satisfying as 
many clauses as possible and (if F is unsatisfiable) a minimal unsatisfiable subformula. 

Proof. This follows from (|T|) , for some a smaller (likely much smaller) than the pure literal threshold 
a . 

We first apply the pure literal rule, taking time 0*(1) (a notation that hides factors polynomial 
in the input parameters) and leaving a full subformula on t variables (if t = 0, F is satisfied and we 
are done). If there are t remaining variables, we now try all 2* possible assignments, taking time 
0*(2 t ). If a is sufficiently small, then (|7|) is at most 4~* for all i > 1, and the expected running 
time is at most J2t>i 0*(l)2*4 _t = 0*(1). 

To produce a minimal unsatisfiable subformula, or list all such sub formulas, again we apply pure 
literal until we are left with a full subformula with t variables and s clauses. Note that there are at 
most (™) ( 2 such formulae, each of which is present with probability at most p s , with s > 2t/r. 
We now look at all 2 s subformulae, and for each we check all 2* assignments of our remaining 
variables (we can easily order the subformulae so that we can search for a minimal unsatisfiable 
subformula). This takes expected time at most 

EE*(:)f?y^E(?'^V^ 

t>l s>2t/r \ / \ / t>\ s>2t/r V 



< E E ( 2e ) 

t>l s>2t/r 



(r+l)s+t 



2rs s 

a 



< E E ( 2er ) 

t>l s>2t/r 

< E E ( 2e -) 2rs « 



a 

t 

n 



n 



t) \2t/r J \n r 

(r-l)s-t 



t>l s>2t/r 
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<£2-« 
t>i 

< 1, 

provided a is small enough. Since the initial application of pure literal takes time 0*(1) we are 
done. □ 

If the structural results extend up to the satisfiability threshold, then most unsatisfiable in- 
stances in the satisfiable regime have a small witness, and so can be identified quickly. This would 
affirmatively answer the following question. 

Question 2. Suppose e > and a = a(n) < (1 — e)c r (n) for all n. Is there a polynomial-time al- 
gorithm that, whp, proves unsatisfiability for a random unsatisfiable formula F € J>(n, on _ ' r_1 ') ? 

More ambitiously, we could hope for algorithms that succeed always, and run in polynomial 
expected time (possibly only for smaller densities a). 

Question 3. Suppose e > and a = a(n) < (1 — e)c r (n) for all n. Is there an algorithm that, for 
a random unsatisfiable formula F G J>(n, an _ ' r_1 ') proves unsatisfiability in polynomial expected 
time? 

4.6. Graphs and hypergraphs. In the graph and hypergraph context, we would like to know 
what happens between the A:-core threshold a^* r and a fc-colorability threshold dfc ir (n)n~( r-1 \ 
recalling that c4,r( n ) is believed but not known to converge to a constant. Here the essential 
question is the analogue of Question [TJ are large minimal non-fc-colorable subhyper graphs unlikely 
between the two thresholds (as large fc-dense subhypergraphs are below fe-core threshold)? 

A result like Theorem [23] can easily be proved for hypergraph coloring (see also |6j for results 
on coloring sparse random graphs). With r, k > 2, r + k > 4, e > 0, p = an - ^ 1 ), and a(n) < 
(1 — e)c4 jr (n), there are also the obvious analogues of Questions [2] and [3) are there are algorithms 
that are efficient (almost always, or in expectation) for fe-coloring random r-uniform hypergraphs 
below the fc-coloring threshold? 
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